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Abstract 


COVID-19 is a disease caused by coronavirus. 'CO' stands for corona, 'VI' for virus, and 'D' for 
disease. Formerly, this disease was referred to as '2019 novel coronavirus. The data mining is the best 
tools for analyzing and predicting the hidden information with the help of pre-existing dataset. The 
covid analysis and prediction for consider different related parameters namely name of the states, total 
cases, today cases, active cases, discharged cases, today discharged cases, overall death and today 
deaths. In this paper, taking consideration into analyzing and predicting covid dataset using Statistical 
techniques namely regression model. Numerical illustrations also provide to prove the results and 


diSCUuSSIONS. 
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1. Introduction 

Data mining is the process of analyzing hidden 
patterns for using pre-existing data. Data mining 1s 
also known as data discovery and knowledge 
discovery for handing advanced data analysis. The 
major steps involved in a data mining process 
namely locate the data, data collection, data 
cleaning, integration, data _ selection, data 
transformation and discovering the knowledge. In 
data mining techniques, normalization is one of the 
most important concepts for prepare a well suitable 
dataset with unique format. Data mining is the 
process of analyzing hidden patterns for using pre- 
existing data. Data mining is also known as data 
discovery and knowledge discovery for handing 
advanced data analysis [1]. The major steps 
involved in a data mining process namely locate 
the data, data collection, data cleaning, integration, 
data selection, data transformation and discovering 
the knowledge [2]. The area of weather forecasting 
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is used to collecting hugs amount of data as 
possible to find the current weather state of the 
atmosphere metrics namely temperature, humidity, 
and wind conditions [3]. Data mining techniques 1s 
easy to understand the atmospheric condition and 
to determine how to find the future atmosphere 
conditions using regression analysis [4]. In data 
mining techniques, normalization is one of the 
most important concepts for prepare a well suitable 
dataset with unique format. After using the 
normalization techniques various’ scales. of 
information converted into similar scale of 
information. Various normalization techniques are 
also used to handling the data analysis, one of the 
most popular normalization techniques called 
maxima and minima normalization [5 - 8]. 

2. Experimental Methods or Methodology 
Regression analysis is a Statistical tool to launch a 
relationship between two or more variables. 
Likewise, one of these variables named as predictor 
variable which means value is collected via 
experiments. Another variable is named as 
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response variable which means derived from the Where y is the response variable, x is the predictor 
predictor. The general mathematical equation for a variable and a, b are constants which are called the 
linear regression 1s, coefficients. 

y=a,+b (1) 


Table.1. Covid-19 overall dataset in India with different cases 
Total | 25.6.2021 | Active | _. 25.6.2021 
6007431 | 9844 | 124911 | 5762661 9371 | 119859 
2854325 | 12078 | 100308 | 2741436 11469 | 12581 
2823444 | 3979 | 110546 | 2678473 9768 34425 


Tamil_Nadu 2449577 6162 49845 2367831 9046 31901 
Andhra_Pradesh 1867017 4981 49683 1804844 6464 12490 


Deaths 


30 


Uitar Pradesh | 1705014] 224 | 3552 | 1679096 | 308 | 29366 | 30 
[West Bengal | 1489286 | 1923_| 20308 | 1449462 [1952 | 17516 | 41__ 
Dei———~«|C433475 | 109) 1767 | 1406760 | 131 | 24948 |B 


4] 


ee 
| 0 


Telengana | 617776 | 1088 | 16030 | 398139 | Isil_| 3607 | 9 


Ss _, 
= a 


oS) 


Puducheny | 115025 | 298 | 3077 | iid | 276 | 4 | 3 


NO 


‘Tripura ——~«|6BR6R | 369—~| «RW | s9a7E | 400 | 662 | 2 
[Chandigah (| isa | 22 | 247 | oases | 42 | 807 | 0 


Lakshadweep | 9601 | 42 | 32 | 9232 [60 | 47 
Andaman Nicobar | 7440 | 2 | 99 | maid [4 | 127 
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Table.2: Statistical observations in Covid-19 dataset in India 


on Total | 25.6.2021| Active |. 25.6.2021 25.6.2021 
837067.91 | 1435.19 | 17024.11 | 809118.58 | 1792.417 | 10925 | 36.91 


1197052.50 | 2801.78 | 31802.81 | 1150068.83 3194.97 20740 97.21 


Table.3. Regression model accuracy for training 
(80%) and testing (20%) 
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3. Results and Discussion 

The secondary Covid’19 related data collected 
from official website of Government of India 
(https://www.covid19india.org/). The — website 
having various information related regarding 
Covid’ 19. 


Conclusions 


In this paper, consider different parameters namely 
states, total cases, today cases, active cases, 
discharged cases, today discharged cases, overall 
death and today deaths. Related dataset shows in 
tablel. In table 2, indicate descriptive statistics, 
which is used to finding the average cases in India 
and how to deviate the in different parameters. 
Regression model explain how to find the 
similarity or linearity with different parameters 
using table 3 and charts 1, 2 and 3. Numerical 
illustrations in table 3 and chart 3, the regression 
approaches only 73% in total cases Vs today cases. 
Based on numerical illustrations shows in chart 4 
and 5, how to influence in the parameters which is 
satisfied the linear regression model and how 
many percentages occur the linearity. In chart 6, 
indicate some hidden information regarding in 
Kerala, positive cases are maximum at the rate of 
death cases compare to other states is also 
minimum. In Punjab, the positive cases are low 
compared to others. But the death ratio also high 
compared to other states. The — graphical 
representation highlighted in chart 6. The 
regression model approach in today cases Vs today 
discharged (25.06.2021). In this case, the model 
accuracy having 93%. The regression model 
highly consider total cases Vs discharged cases. In 
this case, the model accuracy having 99%. In this 
research conclude that total cases Vs discharged 
cases having for better performance in future 
predictions. 
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